Ten (not so) simple rules for clinical trial data-sharing

Clinical trial data-sharing is seen as an imperative for research integrity and is becoming increasingly encouraged or even required by funders, journals, and other stakeholders. However, early experiences with data-sharing have been disappointing because they are not always conducted properly. Health data is indeed sensitive and not always easy to share in a responsible way. We propose 10 rules for researchers wishing to share their data. These rules cover the majority of elements to be considered in order to start the commendable process of clinical trial data-sharing: Rule 1: Abide by local legal and regulatory data protection requirements Rule 2: Anticipate the possibility of clinical trial data-sharing before obtaining funding Rule 3: Declare your intent to share data in the registration step Rule 4: Involve research participants Rule 5: Determine the method of data access Rule 6: Remember there are several other elements to share Rule 7: Do not proceed alone Rule 8: Deploy optimal data management to ensure that the data shared is useful Rule 9: Minimize risks Rule 10: Strive for excellence.

• Rule 1: Abide by local legal and regulatory data protection requirements

Introduction
Clinical trial data-sharing is seen as an imperative for research integrity and is becoming increasingly encouraged or even required by funders, journals, and other stakeholders. For example, the White House Office of Science and Technology Policy is now requiring that all data from taxpayer-funded studies be shared without cost or imposition of embargos [1]. Sharing and reusing clinical trial data maximizes its utility [2]. For example, the reanalysis of data from a clinical trial makes it possible to confirm or disprove its results; secondary analyses make way for the exploration of research questions not considered initially, and meta-analyses on individual participant data (IPD) show promise for the field of evidence syntheses. However, early experiences are disappointing with regard to the actual practical implementation of clinical trial data-sharing [3]. Health data is indeed sensitive and not always easy to share in a responsible way. However, responsible reuse of data carried out with the best research standards is a behavior that is not solely desirable but possible provided it is well anticipated. International institutions in some cases provide researchers with comprehensive and complex policies and implementation guidance for sharing data [4]. We propose 10 (not so) simple rules for researchers wishing to share data, aligned on the clinical trial lifecycle (Fig 1). These rules cover the majority of elements to be considered in order to start the commendable process of clinical trial data-sharing and to facilitate its efficient use for a new research question.

Rule 1: Abide by local legal and regulatory data protection requirements
Many legal and regulatory texts govern data-sharing and reuse, especially in terms of data protection. Among the strongest requirements, the European General Data Protection Regulation applies to any organization established in the territory of the European Union or processing data from people residing there. Among other requirements, it imposes the design and maintenance of specific records (e.g., records of processing activities); the provision of information about the reuse is mandatory and the processing of health data is prohibited without a specific exemption, for example for research. In the United States today, the situation is more fragmented and there are both federal and state initiatives and laws that need to be complied with.
It is important to familiarize oneself with these requirements, as they also change over time [5].
Since regulatory requirements differ across the world and no harmonization yet exists, it is important to consider the regulatory context of both the data generator and the re-user. For example, a definition of anonymization in the US may not fit in the European context, making the resulting regulatory formalities much more cumbersome. A harmonization of texts to govern clinical trial data-sharing and a simplification of procedures are both certainly needed. This complex regulatory environment reinforces the need for good communication between data generators and data re-users in order to align on the essential information about the requirements in force and the procedures that are to be followed. Thus, researchers should seek support from their institutions and approach data teams if they are identifiable.

Rule 2: Anticipate the possibility of clinical trial data-sharing before obtaining funding
If one wants to share data, it is important to recall that data-sharing is not resource-or costfree: data collection, preservation, preparation, and storage in standardized formats, and completion of regulatory and administrative formalities can be time-consuming and resourceintensive processes [6]. A very common barrier to data-sharing cited by trialists is the lack of dedicated financial resources, especially if these costs were not foreseen at the time when the trial obtained funding [7]. It is more and more important to think about data-sharing when the trial is being designed and to request funding dedicated to this activity. This seems feasible, as clinical trial funders are increasingly encouraging researchers to consider how their data will be shared in the future [8]. For example, the National Institute of Health (NIH) has implemented a new data-sharing policy for the research it funds from January 2023 [9]. In France, 5 research-funding agencies (Ademe, Anses, ANR, ANRS|MIE, INCa) have created a working network to harmonize policies regarding open science, and in 2022, they introduced a requirement for a management plan (DMP) [10] for every project approved for funding [11]. Some funders even mandate data-sharing [12]. For example, in Canada, it is the case for the 3 federal funders (Tri-Agency) [13]. Data-sharing is set to become part of grant assessments. For clinical trialists and their respective institutions, to remain competitive, it will be important to adhere to these new requirements. To further educate the research community, and to facilitate datasharing, the Digital Research Alliance of Canada [14] has recently funded 18 pilot Data Champion programs across the country. In all cases, we recommend that the data flow, including its potential future reuse, should be documented in the DMP in the design stage of the trial. This document, which describes the data collected in a research project, and how it will be structured, shared, and stored, is now mandatory in many institutions. In addition, in the design stages of the clinical trial, researchers should ask their institutions if they have already put in place tools and documents for sharing data. Researchers can draw benefit from existing documents and procedures.

Rule 3: Declare your intent to share data in the registration step
It is now well established that the key features of clinical trials must be registered before enrolling the first participant [15]. Data-sharing statements are now part of these features: the International Committee of Medical Journal Editors (ICMJE) requires authors to include a statement on data-sharing as part of the clinical trial registration [2]. To be clear, the ICMJE does not enforce data-sharing, so researchers can still answer "no" to this data-sharing plan. Indeed, most data-sharing statements do not lead to prompt, widely available sharing [16] and there are still many obstacles or prerequisites. The statement can be updated on registries to report any change. Importantly, when the trial is over, the data-sharing statement will also be compulsory in the published paper; here again, the data-sharing is not compulsory but the statement is. These data-sharing statements must indicate (i) if individual data will be shared; (ii) which specific data will be shared; (iii) if other documents will be available (including statistical codes); (iv) when the data will be available and for how long; and (v) how access will be provided. Drafting the data-sharing statement is an opportunity to consider how to remove as many obstacles as possible and how to enhance communication with potential re-users in the future. To ensure continuity, named addresses (e.g., corresponding authors) should be avoided, and a valid and permanent email address of the organization that implemented the clinical trial should be provided. The burden of data-sharing should be on the institution, not on the individual researcher; we will return to this later (see Rule 7). A data request form can be developed to facilitate the first exchanges between the organizations of the requester and the data generator and should be accessible for direct download by the re-user, for example, with the annexes of the manuscript or by a link provided in the statement. Here again, researchers should always ask their institutions if such documents have already been developed.

Rule 4: Involve research participants
A participant-centered approach implies that research projects are carried out "in collaboration with" and not "for" participants. Trial participants should be fully informed of the datasharing plans prior to entering the trial. When participating in a clinical trial, patients provide their consent to risks and constraints for an uncertain benefit. As data cannot be perfectly anonymized, data-sharing carries a new risk with the possibility of re-identification. Patients must be informed about the use of their data for a different objective from that of the initial clinical trial, about the corresponding risk, and the appropriate safeguards. Informing patients of any possible risks is thus an ethical requirement (to ensure trust, to protect individuals) supporting responsible data-sharing [17] and also a regulatory requirement in some areas, such as Europe. To inform the participants, researchers should provide them with additional information on each specific reuse, check for any potential objection to the reuse and, if there is any, remove the data concerned. Sometimes this objection can target a type of re-user (e.g., industry) or a purpose (e.g., purely commercial). To inform trial participants about reuses, we recommend the use of a website the address of which is provided in advance, for example, in the informed consent form of the trial. Suggested wording for informed consent documents are provided by the Inter-university Consortium for Political and Social Research (ICPSR) [18].

Rule 5: Determine the method of data access
When data cannot be directly available, the organization that has generated it may have assigned a data access committee or similar body to review the appropriateness of requests for reuse (see Rule 7).
After checking the relevance of the request, the method of data access is usually chosen by the institution on the basis of the security and cost-effectiveness of implementation. For example, to best secure data-sharing, remote access to an infrastructure that incorporates a login option should be used. Remote access to the data also enables the user's rights to be restricted (e.g., copy and paste data). To achieve this high level of security, it is often necessary to use an external service provider that specializes in providing remote servers. In this spirit, many repositories have been implemented in the USA, such as Vivli and the Yale Open Data Access (YODA) program [19] that already have their own requirements to control and secure datasharing. Most data from trials that is shareable to date is stored by the industry in these repositories and the rules are already fixed. Researchers can ask their institutions which repository is recommended or use the TRUST principles to ensure that they are reliable [20]. Sometimes it is possible to download the data directly locally onto the re-user's server; this method has the disadvantage of lowering the level of traceability of the actions carried out on the dataset after sharing. Some tools such as Datashield [21] propose the running of analyses remotely without physically accessing the dataset and only seeing the results (this is called querying the data). Free data download, as proposed by some repositories like Dryad (3), or direct online access avoid the data request steps, but this should be reserved for anonymized data for security reasons. This approach can still be implemented with certain contractual clauses that should be accepted at the time of the download to make the re-user accountable.

Rule 6: Remember there are several other elements to share
The importance of sharing clinical trial data should not overshadow the basic requirement of complete and transparent reporting of clinical trial results [22]. This includes prospective registration and full reporting of summary results, 2 basic steps that should precede IPD sharing. Prospective registration of the initial study is an essential prerequisite and it enables the main elements of the protocol and its main modifications to be traceable. The reporting of the summary results of the initial study then enables the main results to be communicated in the form of aggregated data. Statistical codes can also easily be shared with fewer restrictions because they do not contain individual data, but they are not so easy to understand and reuse; the name and version of the software and libraries used should be specified along with all other details to make the research reproducible [23]. The most important documents that should be shared are listed in Table 1.
With the possibility of verifying the results, performing secondary analyses, or combining studies in the form of IPD meta-analyses, the correct sharing of data then becomes the icing on the cake.

Rule 7: Do not proceed alone
Researchers should contact their institutions and seek support from them as soon as they are planning to share data. Indeed, all sponsors, academic or industry, need to set up a governance system for data-sharing and adopt suitable DMPs. To ensure the follow-up of requests, organizations should identify a data reuse coordinator (i.e., the person behind the non-nominative email address described in Rule 3). As a local expert, the data reuse coordinator should be able to guide the re-user in complying with local regulations when necessary. In addition, organizations should set up an independent committee to evaluate requests for reuse of data. This committee will independently evaluate and accept requests for data reuse, as required, on the basis of their scientific and ethical relevance as well as the regulatory, contractual, and financial feasibility. At the very least, to avoid the risk of competing interests, researchers related to the requested study should be excluded from the deliberations of these committees. Lastly, in line with the Hong Kong Principles [24], institutions are invited to implement incentives that reward practices that promote reproducible science, e.g., data-sharing. Many universities, such as the University of Cambridge [25], have therefore developed "data champions programs" that acknowledge the value of good data management. This observation brings us directly to Rule 8.

Rule 8: Deploy optimal data management to ensure that the data shared is useful
Preparing data for sharing is an essential step that should not be neglected. It requires technical skills and knowledge of regulations to ensure that the data corresponds to the requests of the re-users while respecting regulations. Data management should always aim for compliance with the FAIR principles [26], which ensure that data is Findable [27], Accessible, Interoperable, and Reusable. Similarly, for clinical trial data that includes indigenous participants, the CARE principles [28] should be consulted. The database should use a universal computerized format to enable new research teams to use it easily. Data and metadata should be standardized and preferably in English language. In other words, clinical trial data cannot be kept in an ivory tower nor in the Tower of Babel [29]. Metadata should always be published to enable reusers to easily identify studies of interest. The statistical codes should be annotated to make them understandable by any another statistician [23]. To help the citation of reused datasets, we advise data generators to obtain a DOI name (digital object identifier) for each dataset. Often the entire dataset does not need to be shared. In this case, researchers should create and share a dataset containing only the data needed for reuse. They should also ask re-users to comply with the data citation principles [30] and require them to cite the relevant DOIs in all publications of that reused data.

Rule 9: Minimize risks
Data-sharing is a risky endeavor from a data protection perspective, because of the risk of reidentification. However, safeguards exist to mitigate this. If the data is publicly available, it should be anonymized. However, anonymizing data does not only raise the issue of potential loss of information from the dataset. Researchers should also expect to undertake a large-scale task, since the rules ensuring the anonymization of data are numerous and not necessarily concordant between countries. For example, one will not obtain the same set of data by following the Health Insurance Portability and Accountability Act (HIPAA) [31] in the US or the data protection working group recommendations [32] in Europe. When anonymization is not possible, researchers can use pseudonymized data, i.e., datasets with no directly identifying data. As re-identification of individuals is possible from these datasets, it is necessary to ensure that the recipients are specified by the protocol and that they are not able to perform analyses that were not agreed on beforehand. Current measures to secure the process often involve (i) establishing a data-sharing agreement that defines the data use rights and obligations; (ii) minimizing the data transmitted, i.e., giving access only to the variables that are strictly necessary for the data reuse; and (iii) choosing the method of data access wisely. However, these security measures sometimes complicate sharing and can therefore appear restrictive but are a guarantee of confidence.

Rule 10: Strive for excellence
The very idea behind an ethical obligation for clinical trial data-sharing is that researchers should comply with the participants' wishes-participants having put themselves at risk during the clinical trial-so as to share their data and favoring the best possible use of it [2]. This means that researchers should require data re-users to adopt the highest standards, especially regarding reproducible research practices [33] such as prospective registration of data reuses and the publication of results according to reporting guidelines. It is essential to be as stringent for re-use protocols as for the initial clinical trials. This is of major importance because data reuses can often carry a higher risk of false-positive findings in new analyses than in the initial analysis of the trial which, in theory, have strictly controlled for type 1 and type 2 errors. These exploratory conditions, combined with potential researcher biases, are fertile ground for nonreproducible findings in secondary data analysis, especially when appropriate safeguards, i.e., open science practices, are not in place [34]. Data-sharing efforts should not enable nontransparent selective reporting of new results. In addition, reuse of data can be challenging and often requires good communication between the re-users and the data generators [7]. Ultimately, at the end of this long journey, researchers may well be happy to have shared their data. There are now incentives for the best re-users (e.g., the Parasite Award [35]) and also for data generators (e.g., the Research Symbiont Awards [36]). After all, science should celebrate the dedication of researchers in following these 10 not-so-simple rules.

Conclusion
These 10 simple rules provide various features that will facilitate clinical trial data-sharing. They are not so simple, and in fact rather complex, but Fig 2 attempts to summarize this complexity in a useable way, so if you don't get it yet, at least have fun with this: 1. We are all after it and research is often not possible without it.
2. Everyone talks about it but especially the ICMJE, researchers have to publish it in a clinical trials registry and in their articles. This is the data-sharing. . .

3.
Characteristics of the secure server enabling access to the data. 4. Prerequisite for involving participants and ensuring that they do not object to reuse.

5.
Step to communicate key elements from the original study protocol prospectively; also recommended for communicating the reuse protocol.
6. Each organization implementing clinical studies should identify this person to manage reuse requests.
7. Principles for ensuring that data is Findable, Accessible, Interoperable, and Reusable. 8. A document that defines the rights and obligations that the data generator and data re-user agree to in any data reuse project.
9. Enables the results of data reuse to be communicated and thus complies with the principles of reproducible research. 10. Unfortunately, they are not universal, but we will have to comply with them for each reuse.